[pkg] Resolve directory symlinks in fetched targets#13792
[pkg] Resolve directory symlinks in fetched targets#13792ElectreAAS wants to merge 10 commits intoocaml:mainfrom
Conversation
df327f2 to
91d83ae
Compare
There was a problem hiding this comment.
Here is an example that I think causes a loop in the current algorithm:
/source/
dir_a/
link_to_b -> ../dir_b
dir_b/
link_to_a -> ../dir_a
Following this along we get an infinitely descending directory.
I think you will need to keep a set of resolved targets and check against it if you want to remember where you have been as to not repeat the resolution.
Another issue that I see is the multiple passes that are currently done, but we can improve this later once we have something that works correctly.
We also might need to add some validation that we don't escape the source directory, I don't think its a good idea to allow symlinks to / for example.
|
I just pushed work in progress to adress the latest comment about cycles. |
fe80747 to
5134ba5
Compare
5134ba5 to
b869da0
Compare
| This fails correctly | ||
| $ build_pkg bar 2>&1 | sanitize_pkg_digest bar.0.0.1 | tail -3 | ||
| Error: Unable to resolve symlink | ||
| _build/_private/default/.pkg/bar.0.0.1-DIGEST_HASH/source/dir_b/link_to_a/link_to_b, |
There was a problem hiding this comment.
CI isn't happy because it finds dir_a/link_to_b/link_to_a before the one I wrote in the test
|
Aside from the CI failures due to non-deterministic errors, the main code is ready for review. |
|
I haven't looked closely, but I think your changes might have made that test non-deterministic. |
c43ca48 to
6a8ab11
Compare
Alizter
left a comment
There was a problem hiding this comment.
Here are some problems I've observed:
-
There is a difference between local and tarball sources when it comes to directory symlinks. The tarball sources correctly resolve the contents wheras local sources silently discard them. I think rather than silently discarding them we should either raise a user error if this is something we wish not to support or support it. I would probably consider not supporting it.
-
Broken symlinks appear to be silently ignored. For example something stupid like a symlink to itself. We should probably error to the user in this case saying that we don't accept such symlinks.
| (* [raw_resolved] is a relative build path but it might contain indirections, | ||
| something like _build/foo/../bar | ||
| or _build/../outside *) | ||
| let canon_resolved = Path.of_string raw_resolved in |
There was a problem hiding this comment.
This only canonicalises relative paths and not absolute ones. I think Path.External.canonicalize_abs was the other way you did it.
There was a problem hiding this comment.
For some reason, raw_resolved is always a relative path, which is correctly canonicalized by Path.of_string
I ended up removing Path.External.canonicalize_abs anyway
|
Here are some of the issues we currently have: |
bc64415 to
71bc486
Compare
|
Update: we've looked further into this and it seems the desired behaviour in I've pushed a new version containing that deletion, along with a few WIP comments, I'm looking into them |
71bc486 to
7a3f5e6
Compare
bdb9bfd to
2ba65a9
Compare
|
I've extracted a few unnecessary changes to #14291 to make review as simple as possible. |
Just a little change extracted from #13792, nothing changing dune behaviour. Figured I'd use Fpath.traverse where it's appropriate
Just a trivial simplification in `Pkg_rules.source_files`, nothing changing dune behaviour. Instead of `map`ping the `relative` part later, do it once in the partition. This isn't particularly useful on its own, but in case of changes inside the partition function, it makes debugging a lot easier Extracted from #13792
2ba65a9 to
46866ea
Compare
| seen, if symlinks_in_children then Some Unix.S_DIR else None) | ||
| | { Unix.st_kind; _ } -> | ||
| (* We do not care about symlinks pointing to anything but directories. *) | ||
| seen, Some st_kind) |
There was a problem hiding this comment.
You say we don't care, but won't returning Some st_kind mean further processing by Fpath.traverse? Wouldn't it be better to actually skip it with None?
| seen, Some st_kind) | ||
| in | ||
| let _symlinks_seen : String.Set.t = | ||
| Fpath.traverse |
There was a problem hiding this comment.
Question about Fpath.traverse: Is it possible for this function to be "tricked" into reading more than it bargained for? i.e. can you create more things for it to read as it is reading?
There was a problem hiding this comment.
My follow up question is what's the difference between the two cycle detection mechanisms:
seenis_descendent
There was a problem hiding this comment.
Question about Fpath.traverse: Is it possible for this function to be "tricked" into reading more than it bargained for? i.e. can you create more things for it to read as it is reading?
Yes it's possible. If on_symlink returns a Some kind, traverse's handle_kind will be called on that kind. You could even create an infinite loop by making on_symlink return a directory that contains a symlink, that will then be turned into yet another directory...
What I'm doing is obviously not that pathological case, but since I am turning directory symlinks into regular directories, I do create more stuff for the traverse to read. This is intended
There was a problem hiding this comment.
My follow up question is what's the difference between the two cycle detection mechanisms:
*seen
*is_descendent
I just checked and it turns out that seen is useless (at least in the tests we have so far). It was necessary at some point when is_descendant didn't work as I'd thought, but not anymore.
I'll try to stress-test it a little more, but maybe it can be removed
There was a problem hiding this comment.
Yes, you can replace seen with () and simplify the rest a little bit more :-)
Signed-off-by: Ambre Austen Suhamy <ambre@tarides.com>
… source) Sort entries in fetch to guarantee determinism Remove the symlink resolution happening in pkg_rules as it's too complicated. The main change happening in fetch is still there. Signed-off-by: Ambre Austen Suhamy <ambre@tarides.com>
The big comment explaining everything was moved to fetch. Signed-off-by: Ambre Austen Suhamy <ambre@tarides.com>
Signed-off-by: Ambre Austen Suhamy <ambre@tarides.com>
Signed-off-by: Ambre Austen Suhamy <ambre@tarides.com>
Signed-off-by: Ambre Austen Suhamy <ambre@tarides.com>
…d function Signed-off-by: Ambre Austen Suhamy <ambre@tarides.com>
Remove 'seen' as cycle detection, is_descendant is enough Signed-off-by: Ambre Austen Suhamy <ambre@tarides.com>
46866ea to
cafeb8b
Compare
Signed-off-by: Ambre Austen Suhamy <ambre@tarides.com>
|
I tested this with |
|
Those OxCaml failures are real and are due to a new version of ppx_expect. It seems |
Signed-off-by: Ali Caglayan <alizter@gmail.com>
|
Annoyingly OxCaml is using ppx_expect v18, but v18 hasn't been released so we have a situation where the OxCaml version of the build is using a newer library that cannot overlap with the old version we have available. Luckily we can detect that and pass some compatibility flags to make it work. |
Housekeeping
This PR fixes the tests in #13393
#9873 will not be fixed, but still a significant step towards fixing #13678.
What this PR does
After fetching package sources, add a pass resolving directory symlinks. As they're not problematic at this stage, file symlinks are left as is. Broken symlinks are removed silently to preserve existing behaviour.
Note: both
portable_hardlinkandportable_symlinkwork backwards from what I initially understood, see #13791Done with the help of @Alizter, so thanks :)